-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-38943][runtime] Support Adaptive Partition Selection for RescalePartitioner and RebalancePartitioner #27446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
046c85b to
e5ea873
Compare
|
Hi, @davidradl @X-czh Could you help take a look ? thx a lot. |
e5ea873 to
b95259b
Compare
|
@flinkbot run azure |
|
@RocMarshal Thanks for the quick contribution. I'll take a look later this week. |
docs/layouts/shortcodes/generated/all_taskmanager_network_section.html
Outdated
Show resolved
Hide resolved
docs/layouts/shortcodes/generated/all_taskmanager_network_section.html
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/configuration/NettyShuffleEnvironmentOptions.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Outdated
Show resolved
Hide resolved
...untime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriterBuilder.java
Outdated
Show resolved
Hide resolved
...untime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriterBuilder.java
Outdated
Show resolved
Hide resolved
b95259b to
750f38c
Compare
RocMarshal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @davidradl for the review.
I updated the related lines based on your comments.
PTAL ~
docs/layouts/shortcodes/generated/all_taskmanager_network_section.html
Outdated
Show resolved
Hide resolved
docs/layouts/shortcodes/generated/all_taskmanager_network_section.html
Outdated
Show resolved
Hide resolved
...untime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriterBuilder.java
Outdated
Show resolved
Hide resolved
...untime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriterBuilder.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/configuration/NettyShuffleEnvironmentOptions.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/configuration/NettyShuffleEnvironmentOptions.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/configuration/NettyShuffleEnvironmentOptions.java
Show resolved
Hide resolved
...t/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriterTest.java
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Outdated
Show resolved
Hide resolved
54ae4fb to
9b62a25
Compare
...untime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriterBuilder.java
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Show resolved
Hide resolved
.../main/java/org/apache/flink/runtime/io/network/api/writer/AdaptiveLoadBasedRecordWriter.java
Outdated
Show resolved
Hide resolved
…lePartitioner & RebalancePartitioner
9b62a25 to
20a10b8
Compare
| ResultPartitionWriter writer, long timeout, String taskName, int maxTraverseSize) { | ||
| super(writer, timeout, taskName); | ||
| this.numberOfSubpartitions = writer.getNumberOfSubpartitions(); | ||
| this.maxTraverseSize = Math.min(maxTraverseSize, numberOfSubpartitions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why we need numberOfSubpartitions and the maxTraverseSize. why not set numberOfSubpartitions to Math.min(maxTraverseSize, numberOfSubpartitions) and remove private final int maxTraverseSize;. then you do not need to check the maxTraverseSize. in the logic as the numberOfSubpartitions will always be the minimum, accounting for the maxTraverseSize.
Also on a previous response to a review comment you said maxTraverseSize could not be 1, but it could end as one if numberOfSubpartitions == 1 due this Math.min. We should probably check for the numberOfSubpartitions == 1 case and not do adaptive processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @davidradl
thanks for your comments.
I am wondering why we need numberOfSubpartitions and the maxTraverseSize. why not set numberOfSubpartitions to Math.min(maxTraverseSize, numberOfSubpartitions) and remove private final int maxTraverseSize;. then you do not need to check the maxTraverseSize. in the logic as the numberOfSubpartitions will always be the minimum, accounting for the maxTraverseSize.
numberOfSubpartitions represents the number of downstream partitions that can be written to.
maxTraverseSize, on the other hand, represents the maximum number of partitions that the current partition selector can compare when performing rescale or rebalance.
Based on the above description, suppose numberOfSubpartitions = 6 and maxTraverseSize = 2. In this case, the program would inevitably stop writing data to 4 downstream partitions, which is not the expected behavior.
Also on a previous response to a review comment you said maxTraverseSize could not be 1, but it could end as one if numberOfSubpartitions == 1 due this Math.min. We should probably check for the numberOfSubpartitions == 1 case and not do adaptive processing.
When the number of downstream partitions is 1, setting maxTraverseSize to a value greater than 1 is meaningless, because there is only one downstream partition. No additional traversal or comparison is needed, and the only available partition can be selected directly.
In addition, when the number of downstream partitions is not 1 and the user explicitly sets maxTraverseSize to 1, this means that under this strategy the next partition is selected directly without any load calculation, and data is written to it immediately. This behavior is equivalent to not enabling the adaptive partition feature.
Therefore, when we previously said that maxTraverseSize cannot be 1, we meant that users are not allowed to configure this option with a value of 1. It does not mean that the internal maxTraverseSize cannot be 1. As explained above, when the internal maxTraverseSize becomes 1, it is caused by the number of downstream partitions being 1.
The number of downstream partitions is not always determined by user operations. For example, when a streaming job enables the adaptive scheduler, the parallelism of each operator or task may differ, which can lead to an uncontrollable number of downstream partitions for certain tasks. As a result, maxTraverseSize inside the writer may become 1 in such cases.
Please correct me if I'm wrong. Any input is appreciated!
What is the purpose of the change
[FLINK-38943][runtime] Support Adaptive Partition Selection for RescalePartitioner and RebalancePartitioner
Brief change log
Introduce the following:
Verifying this change
This change added tests and can be verified as follows:
The benchmark about it is here
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation